AD-A119  609  EDUCATIONAL  TESTING  SERVICE  PRINCETON  NJ  F/6  12/1 

SAMPLING  VARIANCES  AND  COVARIANCES  OF  PARAMETER  ESTIMATES  IN  IT— ETC<U> 
AUG  S2  F  M  LORD*  M  S  WIN6ERSKY  N00014-80-C-0402 

UNCLASSIFIED  ETS-RR-82-33-0NR  NL 


6  0  9  6  1  LV  OV 


RR-82-33-ONR 


SAMPLING  VARIANCES  AND  COVARIANCES 
OF  PARAMETER  ESTIMATES  IN 
ITEM  RESPONSE  THEORY 


Frederic  M.  Lord 
and 

Marilyn  S.  Wingersky 


Q_ 

O 


This  research  was  sponsored  in  part  by  the 
Personnel  and  Training  Research  Programs 
Psychological  Sciences  Division 
Office  of  Naval  Research,  under 
Contract  No.  N00014-80-C-0402 

Contract  Authority  Identification  Number 
NR  No.  150-453 


Frederic  M.  Lord,  Principal  Investigator 

Educational  Testing  Service 
Princeton,  New  Jersey 

August  1982 

A 


Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government. 


Approved  for  public  release;  distribution 
unlimited. 


SAMPLING  VARIANCES  AND  COVARIANCES 
OF  PARAMETER  ESTIMATES  IN 
ITEM  RESPONSE  THEORY 


Frederic  M.  Lord 
and 

Marilyn  S.  Winger sky 


This  research  was  sponsored  in  part  by  the 
Personnel  and  Training  Research  Programs 
Psychological  Sciences  Division 
Office  of  Naval  Research,  under 
Contract  No.  N00014-80-C-0402 

Contract  Authority  Identification  Number 
NR  No.  150-453 


Frederic  M.  Lord,  Principal  Investigator 


Educational  Testing  Service 
Princeton,  New  Jersey 

August  1982 


Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government. 


Approved  for  public  release;  distribution 
unlimited . 


otCuRlTV  CLASSIFICATION  O  F  THIS  PACE  f»n*n  Defe  Entm  red) 


REPORT  DOCUMENTATION  PAGE 


f >  REPORT  NUMBER 


| r  title  ami  Sutnnia) 


Jif>p  READ  INSTRUCTIONS 

_ BEFORE  COMPLETING  FORM 

2 .  GOVT  ACCESSION  NO.  ""3  RECIPIENT'S  CATALOG  NUMBER 


IP  //f  <£< 


Sampling  Variances  and  Covariances  of  Parameter 
Estimates  in  Item  Response  Theory 


|  T  AuThOR'! 


Frederic  M.  Lord  and  Marilyn  S.  Wingersky 


5  type  of  report  a  period  covered 


Technical  Report 
t  performing  org  report  number 

RR-82-33-ONR _ 

”0  Contract  or  grant  number  u 


N00014-80-C-0402 


9  PERFORMING  ORGANIZATION  name  ano  address 

Educational  Testing  Service 
Princeton,  NJ  08541 


10.  PROGRAM  ELEMENT  PROJECT  task 
AREA  A  WORK  UNIT  NUMBERS 


NR  150-453 


'I  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Personnel  and  Training  Research  Programs 
Office  of  Naval  Research  (Code  458) 
Arlington,  VA  22217 


12  REPORT  DATE 

August  1982 

13  number  of  pages 

36 


*  MONI  TORin  O  AGENCY  n  AME  a  AODRESSill  illttaranl  Iron,  C  ...uu-lllri*  Oflt,  «  I  IS  SECURITY  CUSS  ...  I  rhi  t 

Unclassified 

a  DEC L  ASSl  F  fc  A  T I O  N  DO  W_NG~R A  Of  N  C 
SCNEOULE 

'6  distribution  statlment  t«w  Report  ~~  ~ 


Approved  for  public  release;  distribution  unlimited. 


DISTRIBUTION  ST  ATEMENT  foi  fhe  ebafrecf  entered  In  ftfm-fc  20.  If  different  from  Report) 


'8  supplementary  notes 


M9  KEY  WOROS  (Conllnu*  on  revere*  aide  if  neceeeery  and  idantffy  by  block  number) 


Sampling  Covariances 
Standard  Errors 
Item  Response  Theory 
Maximum  Likelihood  Estimators 


Item  Parameters 
Matrix  Inversion 


20  ABSTRACT  fConflnue  on  revere#  aide  if  nacmamary  and  Identity  by  block  number) 

/A  This  paper  develops  a  possible  method  for  computing  the  asymptotic  sam¬ 
pling  variance-covariance  matrix  of  joint  maximum  likelihood  estimates  in 
item  response  theory  when  both  item  parameters  and  abilities  are  unknown. 

For  a  set  of  artificial  data,  results  are  compared  with  empirical  values; 
also  with  the  variance-covariance  matrices  found  by  the  usual  formulas  for  the 
case  where  the  abilities  are  known,  or  where  the  item  parameters  are  known. 

The  results  are  consistent  with  the  conjecture  that  the  new  method  is 
asymptotically  correct  except  for  errors  due  to  grouping,  , 
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Sampling  Variances  and  Covariances 


of  Parameter  Estimates  in  Item  Response  Theory 

Abstract 

This  paper  develops  a  possible  method  for  computing  the  asymptotic 
sampling  variance-covariance  matrix  of  joint  maximum  likelihood  estimates 
in  item  response  theory  when  both  item  parameters  and  abilities  are 
unknown.  For  a  set  of  artificial  data,  results  are  compared  with  empirical 
values;  also  with  the  variance-covariance  matrices  found  by  the  usual 
formulas  for  the  case  where  the  abilities  are  known,  or  where  the  item 
parameters  are  known.  The  results  are  consistent  with  the  conjecture 
that  the  new  method  is  asymptotically  correct  except  for  errors  due  to 
grouping. 


; 
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Sampling  Variances  and  Covariances 
of  Parameter  Estimates  in  Item  Response  Theory* 

In  item  response  theory  (IRT),  the  observations  come  in  the  form 
of  an  n  -by-  N  matrix,  with  one  row  for  each  item  and  one  column  for 
each  examinee.  The  joint  frequency  distribution  of  the  observations 
depends  on  a  vector  of  N  'ahilitv'  parameters — one  for  each  person — 
and  on  a  matrix  of  item  parameters.  Here,  we  will  consider  only  the 
three-parameter  logistic  model  for  dichotomously  scored  items,  so  there 
will  be  three  item  parameters  (  a  ,  b  ,  and  c  )  f or  each  of  n 
items.  A  method  will  he  developed  for  computing  the  asymptotic  sampling 
variance-covariance  matrix  when  both  abilities  and  item  parameters  are 
unknown.  Until  this  is  done  we  do  not  know  the  standard  errors  of  the 
parameter  estimates,  which  handicaps  development  of  a  goodness-of  fit  test 
and  other  statistics  required  in  applications  of  IRT. 

If  the  item  (ability)  parameters  are  known,  the  estimated  ahilitv 
(item)  parameters  have  independent  sampling  distributions.  It  can  be 
shown  (see  Bradley  \  Cart,  1962)  that  the  maximum  likelihood  estimates 
of  the  ahilitv  (item)  parameters  are  consistent.  Hence  the  asymptotic 
sampling  variance  for  an  estimated  ahilitv  parameter  is  given  by  the 
usual  formula 

Var(;r  a.b.c)  -  [M  ■(/  ■  r)2f 1  ,  ,  I..) 

where  •  is  the  estimated  ability  parameter,  (  is  t  lie  log  ol  t  ho 
likelihood,  and  a  ,  b  ,  and  c  arc  the  known  vectors  of  item  parameters. 

*This  work  was  supported  in  part  bv  contract  N0001 4-80-0-0402, 
project  designation  NR  l>0-4r)l  between  the  Office  of  Naval  Research 
and  i'.ducat  iona  1  Testing  Service.  Reproduction  in  whole  or  in  part  is 
permitted  tor  nuv  purpose  ol  t  ho  Toiled  States  (.'ovornmeut  . 
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Similar lv  the  asymptotic  sampling  variance-covariance  matrix  ot  the 
estimated  item  parameters  for  an  item  is  given  by 


!|  Cov ( r  r  )  I ! 
v  w  ' 


(  v,w  =  1,2,3  )  (lb) 


V  w 

where  ■ :  •  is  a  vector  consisting  of  the  estimated  a  ,  b  ,  and  c 

lor  a  single  item  and  is  the  known  vector  of  abilities. 

I  he  right-hand  side  is  the  inverse  of  a  3-bv-3  matrix. 

When  neither  item  nor  ability  parameters  are  known,  all  param¬ 
eters  are  often  estimated  simultaneously  by  maximum  likelihood.  In 
the  (Rasch)  case  where  there  is  only  one  parameter  per  item,  Habernan 
(1977)  has  shown  that  all  parameter  estimates  will  converge  tc  thei’* 
true  values  (witL  he  consistent)  when  the  number  of  examinees  and  the 
number  of  test  items  become  large  simultaneously.  Empirical  results 
suggest  that  consistency  probably  also  holds  when  all  parameters  are 
estimated  simultaneously  under  the  three-parameter  model.  If  so, 
it  is  reasonable  that  the  asymptotic  sampling  variance-covariance  matrix 
of  all  estimated  parameters  will  be  given  by  the  usual  formula 


Cov ftp. 


(  P,q  =  1,2 . M  )  (2) 


where  M  =  3n  +  N  -  2  and 


P 


artV,'ra2'b2’l'2 . V'W 
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Kince  standard  errors  are  urgently  needed  in  practical  work 
where  all  parameters  are  estimated  simultaneously  by  maximum  likelihood, 
this  report  compares  numerical  values  provided  by  (2)  with  values  provided 
by  (1)  and  with  empirically  observed  sampling  fluctuations.  The  com¬ 
parisons  to  be  presented  suggest  that  (2)  provides  useful  values  for 
the  desired  standard  errors. 

There  are  several  special  problems  that  arise  in  the  evaluation 
and  practical  utilization  of  (2),  problems  that  do  not  arise  in  the 
situation  where  (.1)  is  appropriate: 

1.  Until  an  origin  and  scale  are  specified,  the  parameters 
are  not  identifiable. 

2.  The  mathematical  formulation  is  complicated  by  the  choice 
of  origin  and  scale. 

3.  The  usual  choice  of  origin  and  scale  when  estimating  TRT 
parameters  is  inconvenient  for  mathematical  purposes. 

4.  The  numerical  values  of  the  sampling  variances  arc  verv 
much  affected  by  the  choice  of  origin  and  scale. 

3.  liquation  (2)  requires  the  inversion  of  a  matrix  of  order 
N  +  3n  -  2  where  N  may  be  several  thousand. 

These  problems  will  he  considered  in  subsequent  sections. 

1 ■  Parameter izat ion 

The  appropriate  likelihood  function  is  (Lord,  19801 

n  N  u ,  1  — " 

I.  fa,  b,  c ;  ■!  T)  =  1’ .  la  H) 


where  is  the  vector  of  the  N  ability  parameters;  a  ,  b  ,  and 

v  are  each  a  vector  of  n  item  parameters,  I'  =  u.^!  is  the  matrix 

of  item  responses  u.  (=  0  or  1);  finallv  Q.  =1-1’.  and 

la  la  la 

P.  is  the  item  response  function,  the  piohability  of  a  correct 

lii 

answer  by  examinee  a  to  item  i  .  Each  Riven  P  is  a  function 

of  and  of  a.  ,  b.  ,  and  c.  ,  but  not  of  anv  other  parameters, 

a  ill 

In  numerical  work  bore,  I'.  will  bo  taken  to  be  the  three-parameter 

la 

logistic  function 

1  -  c 

Pia  *"i  +  l  +  expf-1.7a.(0  -  b.)]  ' 

l  a  i 

For  mathematical  purposes,  however,  it  is  only  necessary  to  state  that 

P.  is  an  increasing  function  of 
la  a 

If  wc  add  some  constant  to  all  and  subtract  the  same  constant 

a 

from  all  b.  ,  all  P.  will  be  unchanged.  This  means  that  the  origin 

1  li.1 

used  for  measuring  ability  is  entirely  arbitrary.  If  we  multiply  each 

"  and  each  h.  bv  some  constant  and  divide  each  a.  bv  the  same 
a  i  l 

constant,  again  all  P.  will  be  unchanged.  This  means  that  the  unit 

used  to  measure  ability  is  entirely  arbitrary.  Since  we  can  change 

the  origin  and  unit  of  the  :  without  changing  (3),  it  follows  that 

<1 

a  ,  b  ,  and  c  are  not  identifiable  and  cannot  be  estimated  from 
(3)  without  further  specification. 

To  coni orm  to  a  commonly  used  procedure,  we  could  choose  the 
origin  and  scale  so  that  for  some  specified  group  of  examinees  the 
mean  of  the  is  zero  and  the  variance  Is  one.  This  is  not  con¬ 


venient  mathemat ica 1 1 v,  however. 


Instead,  two  other  methods  of 
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specifying  the  origin  and  scale  will  be  used,  even  though  this  will 

complicate  matters  later  on  when  the  results  are  applied  in 

practice.  In  the  first  method,  without  loss  of  generality,  arbitrary 

numerical  values  will  be  assigned  to  0  ,  and  to  0 

N-l  N 

The  M  =  N  +  3n  -  2  likelihood  equations  are 


0  = 


=1  a=l 


(u.  - 

la 


p.  j 

la 


P.  Q. 

ia  la 


(  P  =  1,2 . M  ) 


(5) 


where  Pia  =  IP.  /Jr 
P  ia  p 


Fisher  Information  Matrix 


The  Fisher  information  matrix  on  the  right  of  (2)  now  has  as  a 
typical  element 


,  .  „  .  n  n  N  N 

I  =  gr  -  JL .  ±L_  )  -  r  v  r  r 

pq  ^  3t  3t  ;  , 

P  <1  i=l  J=1  a=l  b=l 


p“Vb 

P  QPT’  qT'  Cov(ula’“jb) 

iasia  jb  jb 


(  p,q  =  1,2 . M  ) 


Because  of  local  independence  and  random  sampling  of  examinees. 


Cov(u.  ,u.,)  =  a,.6  P.  Q. 
ia  jb  ij  ab  ia  ia 


where  i  =  l  if  s  =  t  ,  i  =0  otherwise.  Thus  the  typical 
st  st 

element  is 


I 

pq 


n  N 
i=l  a=l 


r 1 1  “ 
__E _ 3^ 

P.  Q. 
ia  ia 


(  p.q 


1 , 2, . .  .  ,M  ) 


(6) 


A 
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Note  that  Pla  is  zero  unless  either  p  and  a  refer  to  the 
P 


pq" 


and 

i 

refer  to 

the  same  item. 

Thus 

S1 

0 

..  .  0 

!  fn 

f  12  ”• 

C1N’ 

0 

S2 

...  0 

|  f  21 
i  ' 

f  22  ‘  * 

f  2N  * 

0 

0 

...  s 

n 

1  : 

1  f"i 

fn2 

^nN  ’ 

fh 

f21 

fnl 

l 

l 

j  1 

0 

0 

*12 

f22 

*•'  fn2 

J  0 

i 

!  : 

t2  ... 

0 

fN'l 

-N'  2 

•••  -N'n 

1 

1 

I  0 
l 

0 

V 

(7) 


where  N'  E  N 


S.  is  the  3-bv-3  Fisher  information  matrix  for 
1 


li  ’ 


b.  ,  and  c.  ,  t  is  the  Fisher  information  for  examinee  a  , 
i  l  a 


and  f.  is  the  3-by-l  joint  Fisher  information  vector  for  item 
.  la  J  J 

and  examinee  a  : 


3P  .  /Dh 

f  -  ia  a 

-ia  =  P,  Q, 
ia  ia 


3 P  .  /3a. 
ia  i 

3P .  /3b. 
ia  l 

3  P  .  /3c. 
ia  l 
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3.  Matrix  Inversion 

The  following  general  formula  for  inverting  a  partitioned  mat  r  Lx 
may  be  applied  to  (7) 


■  S  i  F " 
JL 

-1 

S  1  +  S  1Fy-LF,S1;  -S-1FZ  1 

1 

- j - 

F»  i  T 

-  u 

-Z_1F’S-1  !  z"1 

where 

7  =  T  -  F’S^F  .  (• 

The  matrix  S  is  easily  inverted  since  it  is  a  diagonal  supermatrix: 


o 


The  notation  on  the  right  denotes  a  diagonal  matrix  with  diagonal  ele¬ 
ments  S .  These  last  are  easily  computed  since  each  is  onlv 

3  by  3. 

All  the  matrix  operations  indicated  on  the  right  side  of  (h)  can 
be  carried  out  on  the  computer  without  difficulty,  with  one  exception: 
the  inversion  of  Z  ,  which  is  N*  by  N '  .  The  apyroxir..:  is  here 

to  invert  Z  r-  Lies  on  grouping  the  0  into  16  class  intervals  .a 
width  0.5,  covering  the  range  -5  _  >'•  3  .  Kach  t*  a  in  a  given 

class  interval  is  replaced  by  the  midpoint  of  the  interval. 
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o 

Now  T  will  be  a  diagonal  supermatrix  T  £  Ht  I  ,  where  T  E  t  I 

^  g  g  g 

is  a  scalar  matrix  with  dimensions  N  by  N  ,  and  N  is  the  number 

g  g  g 

of  people  in  class  interval  g  .  Also,  F  will  be  a  row  vector  of  16 
matrices,  the  columns  of  any  one  matrix  being  all  identical: 


F 


fi1r  f2l2’ 


(10) 


where  f  =  ,  f  .  '  for  any  examinee  a  in  class  interval  g  and 
-g  ia 

1  is  a  unit  vector  whose  length  is  N 
g  g 

The  product  F'S  can  now  be  written  as  a  16-by-16  supermatrix: 


F ’ S_1T  =  1  f's  *f  l *  1 1 

-g  g  ~h-h 


Denote  the  scalar  bv  w  ,  .  We  now  have 

g  -  h  '  gh 


Z  =  T  -  ; .  M 


gb1 


Mgh  = 


Hi) 

(12) 


For  computation  purposes,  z  still  has  N'  rows  and  columns, 
not  just  16.  For  the  usual  sample  size,  it  is  still  not  feasible  to 
invert  Z  with  a  standard  inversion  program. 

Consider  the  problem  of  inverting  2.  ,  the  -by-  N^  upper 

Je  t  corner  o!  7.  .  By  (11),  (12),  and  a  standard  formula, 


i 


J 
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'11  [T1  W1  i:i:i 


T,  11. 1,’T, 1 


-1  -1  W1  1  1  1  1  1  1  1  1  ] 
l,l,’l  5  T,  +  — 


1  1  -  “nhh'h 


(13) 


Since  =  t^I  ,  where  t ^  is  scalar,  this  becomes 


7-i  i ,  -n1!1; _ 

11  t.  2 

1  ci  -  tiwnNi 


Next,  the  upper  left  2-by-2  supermatrix  in  '/.  can  be  invert  c  .i 
as  in  (H),  using  the  standard  formula  for  the  inversion  of  a 
partitioned  matrix: 


V 

■  11 

h; 

-1 

r  -1  -1  -1  1 

V  +  7  Z  H  Z  7  1 

11  11  12  2111 ; 

,  _ _ _  _  4__ 

:’llZ12H 

z 

l  21 

Z22. 

1 

_1  -1  ! 

-\\  7  y  • 

L  /-2i/jn  | 

If1 

where  H  =  -  7  7  ^2  .  It  can  he  seen  that  1)  has  the 
same  general  form  as  7.  and  can  thus  be  inverted  as  in  (13); 
so  (14)  can  readily  be  calculated. 

Next,  substitute  (14)  for  in  the  foregoing  procedure, 
and  repeat  this  procedure,  in  such  a  way  as  to  invert  the  upper 


left  3-bv-3  supermatrix  in  7.  .  A  total  of  fifteen  repetitions  enable 
us  to  invert  the  16-by-16  supermatrix  7.  .  Equation  (8)  is  now  used 
for  one  final  inversion,  the  result  being  the  desired  variance-covariance 
matrix  of  all  N  +  3n  -  2  parameters. 

The  16-by-16  variance-covariance  supermatrix  for  the  •  consists 
of  2.06  blocks.  The  elements  are  all  the  same  within  a  block  except 
Lor  diagonal  blocks,  each  of  which  has  a  variance  (instead  of  a 
covariance)  repeated  along  its  diagonal.  Any  two  examinees  in  the 
same  class  interval  will  have  identical  Var  ••  and  identical  sampling 
covariances  with  any  other  given  parameter  estimate. 

4_. _ Reparameterization 

In  Section  1,  in  order  to  have  identifiable  parameters,  an  origin 
and  scale  was  chosen  so  that  •  and  ■  had  arbitrary  preassigned 

•  >1.  i\ 

values.  Any  other  choice  of  origin  and  scale  would  result  in  a  linear 
transformation  of  parameters.  The  likelihood  function  would  remain 
unchanged  for  every  pattern  of  item  responses. 

I  he  choice  of  unit  (but  not  the  choice  of  origin)  has  one 
completely  obvious  el  feet  on  the  sampling  errors  of  parameter  estimates. 
II  the  unit  is  changed,  the  standard  errors  for  the  b  ’s  and  's 
'/ill  be  multiplied  by  the  ratio  of  the  new  scale  unit  to  the  old  seal, 
unit.  The  standard  errors  for  the  a  's  will  he  divided  by  this  rati.-. 

A  second  important  effect  is  easily  overlooked:  the  standard  error 
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of  the  maximum  likelihood  estimator  depends  not  only  on  the  choice  of 
scale,  but  also  on  how  the  (origin  and)  scale  is  specified. 

Suppose  that  the  true  numerical  values  of  all  0  (  a  =  1,...,N  ) 

are  specified  on  some  arbitrary  scale.  Suppose  next  that  our  test  is 
too  difficult  for  examinee  N  .  This  means  that  the  likelihood  func¬ 
tion  is  rather  insensitive  to  variations  in  0^  .  If  we  could  repeat 
our  testing  with  several  parallel  test  forms,  we  would  find  a  wide  range 
of  estimates  of  Q  .  In  such  a  situation,  the  difference  between 
true  ■  ^  and  <  clearly  cannot  be  estimated  well  from  the 

examinee  responses.  If  we  define  the  scale  by  treating  6  and 
■fN_j  as  known,  our  estimates  of  every  6^  may  fluctuate  grossly, 
simply  because  the  scale  unit  9„  -  0„  ,  is  not  well  determined  by 
the  data. 

Suppose  next  that  we  relabel  all  examinees  so  that  examinees 

N  -  1  and  N  are  not  the  same  examinees  as  before.  The  ability  scale 

has  not  been  changed  from  the  preceding  paragraph;  it  is  the  procedure 

for  defining  the  scale  that  lias  been  changed.  The  true  •’  for  each 

examinee  is  still  the  same  as  before.  Suppose  the  new  examinees  N  -  1 

and  N  are  both  at  ability  levels  where  our  test  measures  accurately. 

If,  further,  the  true  ■  .  and  are  substantially  different 

N-l  N 

from  each  other,  the  difficulty  of  the  previous  paragraph  disappears: 
Throughout  the  ability  range  where  the  test  is  designed  to  measure 
accurately,  the  standard  errors  of  all  e  may  be  reasonably  small. 

tl 
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For  example,  suppose  on  some  scale  0  =  -3  ,  0^  =  -2  ,  =  -1  , 

-  0  ,  ,!^  =  1  ,  =  2  ,  ’)  =  3  .  We  can  specify  this  same  scale 

in  terms  of  any  two  of  these  0  's.  The  standard  errors  that  we  obtain 
will  depend  in  an  overwhelming  way  not  just  on  the  ability  scale,  but  on 
how  we  specify  it.  We  cannot  rectify  the  standard  errors  by  some 
simple  procedure,  such  as  multiplying  each  by  a  constant. 

For  this  reason,  our  procedure  for  specifying  the  ability  scale 

should  depend  only  on  parameters  or  functions  of  parameters  that  are 

accurately  determined  by  the  data.  A  robust  mean  of  the  0  might 

seem  attractive;  however,  anv  function  of  the  d  is  counterindicated 

a 

by  the  fact  that  sometimes  0  =  +  *  . 

a 

The  procedure  used  here  is  to  choose  a  set  of  m  discriminating, 
moderately  easy  Items  and  a  set  of  r  discriminating,  moderately 
hard  items.  We  will  hereafter  define  the  origin  and  unit  for  our 
new  parameters,  to  be  denoted  by  capital  letters,  so  that  the  mean 
of  the  (true)  15  -parameters  for  the  easy  items  is  zero,  and  the  mean 
for  the  hard  items  is  one. 

Our  new  parameters  are  related  to  our  old  parameters  (from 
Section  2  or  from  Section  5)  by  linear  transformations: 

A  =  ka,  ,  B.  E  K  +  b./k  ,  C.  =  c.  ,  0  =  K  +  d  /k  ,  (15) 

i  i  l  i  i  i  a  a  ’ 

(  a  =  1,2, . . . , N  ;  i  =  1 , 2 , . . . , n  )  , 

where  k  and  K  are  transformation  constants  to  be  determined. 


Since 


I 
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B0=m 


1  B.  =  0  , 

l 


(16) 


the  values  of  k  and  K  are  found  by  substituting  (15)  into  (16)  and 
solving  for  k  and  K  : 


(17) 


where  and  b^  are  means  for  m  and  r  items,  respectively. 

To  find  the  variance-covariance  matrix  for  estimates  of  the  upper¬ 
case  parameters,  rewrite  (15)  as 

°a  ‘  <9a  '  50,/k  •  A1  ”  kai  ’  ”i  *  (bi  -  V'k  • 

(18) 


Because  of  the  special  properties  of  maximum  likelihood  estimators, 
equations  (18)  still  hold  when  estimators  are  substituted  for  parameters. 
Thus  the  sampling  variances  and  covariances  for  estimates  of  the  new 
parameters  can  be  computed  from  the  sampling  variances  and  covariances 
already  obtained  at  the  end  of  Section  3.  Formulas  for  doing  this  can 
be  written  down  from  (18)  by  using  the  ’delta'  method  (Kendall  & 

Stuart,  1969,  Chapter  10).  For  example. 
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C°v  ( A  i ,  >a)  =  Cov(a.,Oa)  -  Cov(a.,b0)  -  — ~  Cov(a.,k) 

a.  a.  „  a .  ( 0  —  b  ) 

+  Cov(6a,k)  -  ~  Cov(b0,k)  -  —  -  — , - —  Var  k 

Cov(b0,k) 

Cov(b1,b()) 

5.  Parameter  Estimation 

The  maximum  likelihood  estimators  (MLE)  satisfy  the  likelihood 
equations  (5).  In  (5),  there  is  one  equation  for  each  parameter 
omitting  0^  and  u  .  If  all  N  +  3n  =  M  +  2  MLE  are  linearly 
transformed,  as  for  example  in  (15),  the  transformed  parameters  will 
still  satisfy  the  likelihood  equations. 

Since  the  origin  and  scale  for  the  new  parameters  is  chosen  to 
satisfy  (16),  then  the  appropriate  k  and  K  are  obtained  from  (17) 
after  replacing  bQ  and  b^  by  their  MLE.  The  likelihood  function 
(3)  is  unaffected  by  these  linear  transformations. 

The  computer  program  LOGIST  identifies  the  parameters  by  still 
another  choice  of  origin  and  scale: 


-  Cov (b^ , bp)  -  Var  bfl  , 

,  m  r  .  . 

=  —  T  7.  Cov(b.,b.) 
mr  lj 
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1.  a  certain  truncated  mean  of  the  e  (a=l,2,...,N)  is  set 

cl 

equal  to  zero, 

2.  a  certain  truncated  standard  deviation  of  the  „  is  set 

a 

equal  to  one. 

We  will  use  the  usual  lower  case  symbols  for  parameters  on  this 

LOGIST  scale.  This  should  not  cause  confusion,  since  the  lower-case 

parameters  of  Sections  1-3  will  not  be  needed  again. 

If  we  start  with  LOGIST  a.  ,  bj  ,  c.  ,  and  0  and  determine 

i  i  i  a 

k  and  K  so  that  Bp  =  0  and  B^  =  1  ,  then  the  ,  B.  , 

(  i  =  1,2, ...,n  ),  and  the  0  (  a  =  1,2,. ,.,M  ),  calculated  by 

a 

substituting  estimated  values  into  (15),  will  still  satisfy  the  like¬ 
lihood  equations.  The  upper-case  parameter  estimates  so  obtained 
should  have  the  sampling  variance-covariance  matrix  found  theoretically 
at  the  end  of  Section  4.  Our  remaining  task  is  to  compare  an 
empirically  determined  variance-covariance  matrix  of  MLF.'s  with  the 
corresponding  theoretical  matrix. 


Recapitulat ion 


We  have  used,  at  different  points,  three  different  arbitrary 
scales  for  our  parameters: 

1.  r  and  »  are  assigned  arbitrarily. 

N  N-l 

2.  The  origin  is  set  at  B^  ,  the  unit  is  . 
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3.  The  origin  is  set  at  a  truncated  mean  of  the  ■<  , 

a 

the  unit  is  a  truncated  standard  deviation  of  the 
a 

Scale  1  (denoted  by  lower-case  symbols)  is  most  convenient 

mathematically  for  the  difficult  task  of  inverting  the  M  -by-  M 

information  matrix.  Scale  1  is  not  useful  for  practical  purposes, 

however,  since  its  use  grossly  inflates  all  the  sampling  variances. 

Scale  2  (denoted  by  upper-case  symbols)  seems  the  simplest  choice 

in  an  attempt  to  keep  the  sampling  error  in  the  estimated  origin  and 

unit  as  small  as  possible.  The  sampling  variances  computed  for  scale 

1  are  transformed  (see  eq.  19)  to  values  appropriate  for  scale  2. 

Although  scale  2  is  not  the  familiar  one,  the  two  item  sets  used  to 

specify  the  scale  can  be  chosen  so  that  the  numerical  values  of  , 

3.  ,  C.  differ  little  from  the  familiar  a,  ,  b.  ,  and  c. 
i  l  ill 

produced  by  LOGIST. 

Scale  3  (hereafter  denoted  by  lower-case  symbols)  is  the  scale 
used  by  LOGIST. 


7.  Empirical  Estimation  Procedures 

As  already  stated, our  theoretical  results  can  be  trusted  only 
if  they  are  shown  to  be  in  reasonable  agreement  with  empirical  results. 
For  this  purpose,  artificial  data  j u .  ||  were  created  representing 


the  administration  of  a  45-item  test  to  a  random  sample  of  1500 
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examinees.  The  1500  0  were  a  spaced  sample  drawn  from  a  distribution 

of  abilities  from  a  regular  test  administration.  Six  replicate  matrices 

of  ||u._  ||  were  independently  generated,  using  the  same  item  parameters  and 

the  same  1500  .  The  variation  in  responses  across  these  matrices  thus 

represents  random  fluctuations  in  u.  for  fixed  a  ,  b  ,  c  and 

ia  i  i  i 

a 

Further  replication  was  also  built  in:  items  16-30  and  items  31-45 
had  the  same  item  parameters  as  items  1—15.  The  true  lower-case  and 
upper-case  item  parameters  are  shown  in  Table  1  for  items  1-15. 

Six  independent  runs  were  made  on  LOGIST,  one  for  each  group  of 
1500  examinees.  For  each  run  separately,  b^  was  calculated  from  items 
4-9,  19-24,  34-39;  b^  was  calculated  from  items  10-15,  25-30,  40-45. 

It  is  convenient  for  our  ultimate  interpretation  of  the  standard  errors 
to  be  obtained  that  the  true  b^  -  bQ  =  .671  -  (-.305)  =  .976.  Since 
this  is  close  to  1.0,  the  scale  unit  for  the  capitalized  parameters 
is  very  close  to  the  scale  unit  for  the  lower-case  (LOGIST)  parameters. 

For  each  run  separately,  all  lower-case  parameter  estimates  were 
linearly  transformed  as  in  (15)  to  the  upper-case  scale,  using  esti¬ 
mated  k  and  K  values.  For  the  data  reported  in  subsequent  sections, 
the  true  k  =  .976  and  the  true  K  =  .312  .  Since  the  six  runs  are 
independent,  an  unbiased  empirical  estimate  of  the  sampling  variance  of 
any  parameter  estimate  T  is  given  by 
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Table  1 

True  (Upper  Case)  Item  Parameters 


Item 

C 

No. 

A 

a 

B 

b 

or  c 

1 

.96 

.99 

-1.75 

-2.01 

.17 

2 

.34 

.35 

-1.33 

-1.61 

.17 

3 

1.34 

1.38 

-.80 

-1.09 

.17 

4 

.76 

.78 

-.48 

-.77 

.17 

5 

.41 

.42 

-.38 

-.67 

.17 

6 

.90 

.92 

-.04 

-.34 

.17 

7 

.90 

.92 

.16 

-.15 

.17 

8 

1.04 

1.06 

.31 

.00 

.17 

9 

1.31 

1.34 

.42 

.11 

.13 

10 

1 .46 

1.50 

.58 

.26 

.34 

11 

.85 

.87 

.79 

.46 

.17 

12 

.60 

.62 

.90 

.57 

.17 

13 

1.06 

1.09 

1.01 

.68 

.25 

14 

1.36 

1.39 

1.23 

.90 

.29 

1  5 

1.46 

1.50 

1.50 

1.16 

.18 
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2  6 


=  5 


b 

y  T  ) ' 


(20) 


the  sum  being  across  the  six  LOGIST  runs.  If  the  T  in  (20)  were 

2  2 

normally  distributed,  s^,/o^.  would  have  an  F  distribution  with  5 
and  «  degrees  of  freedom. 

Since  three  different  items  have  identical  item  parameters,  the 
Sj,  for  a  single  item  parameter  can  be  averaged  across  these  three  items 
to  yield  the  best  available  unbiased  estimate: 


(21) 


Note  that  it  would  be  incorrect  to  pool  all  18  values  of  T  in 
an  equation  like  (20),  since  T  from  the  same  LOGIST  run  are  not 
independent . 

If  and  represent  two  different  item  parameters  in  the 

same  item 


_  '  ]  J  . 

s(Ti,  Si)  =  J  ■'  s(i\,  Sj)  l  --') 

which  is  the  same  as  (21)  except  that  covariances  are  suhst  i  tilted  !■  r 
variances.  If  T.  and  S.  represent  item  parameters  in  ditlereiu  iums, 
then  there  are  nine  different  sample  covariances  to  be  summed: 


t 


1 
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;<vv  ■  I  ” 


(23) 


If  T  is  an  ability  parameter,  (20)  still  holds.  For  our  purposes, 
replacing  T  by  0  ,  we  can  write 


-2  1  vg  2 
s'  =  —  2  s  ■ 

a  »  a 


o- 


where  the  sum  is  over  all  examinees  in  group  g  .  When  •  is  at  the 
midpoint  of  interval  g  ,  this  average  should  be  roughly  equal  to  the 
j  ;  obtained  in  Section  4. 

If  subscripts  a  and  b  denote  different  examinees  in  group 

&  » 


8(0  .2,  ) 
a  b 


N  (N  -  1)  l"a’  V 


(25' 


8  8 


a>b 


where  the  sum  is  over  all  pairs  of  examinees  in  group  g  .  If  a  and 
b  denote  examinees  in  groups  g  and  h  respect ivelv  (  g  ^  h  ) ,  then 


NT 


1 


g 


N, 


S  ( ' '  ,  0  )  —  — -  '  J  s  f '  ,a  ) 

a’  b  N  N,  ‘  .  V  a’  b' 

g  h  a=l  b=l 


Ct.  ) 


Finally,  if  i  is  an  item  parameter  and  examinee  a  is  in  group  g,  , 


then 
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sCr.  ,o  )  = 

i  a 


1 

3N 


N 

J  g 


s  (T  l 


( 2; ) 


In  computing  (21)  -  (27),  examinees  are  grouped  on  their  true  values, 
not  on  their  estimated  values. 

A  problem  arises  when  an  examinee  obtains  a  perfect  score  or  a 
zero  score.  In  this  case  his  7  is  infinite  and  cannot  be  advantageous! 
used.  Instead  of  making  some  ad  hoc  adjustment,  the  17  examinees  for 
whom  this  occurred  were  simplv  removed  from  the  group  or  examinees 
studied,  leaving  N  =  1483  .  This  has  the  effect  of  slightly  biasing 
s-  for  the  remaining  most  extreme  '  values. 


8  .  N'umer  i  cal  Standard  Frrors 

Since  the  c  parameter  of  an  easy  item  usually  cannot  be 
accurately  estimated,  LOGIST  in  ordinary  use  does  not  estimate  them 
individually.  This  would  prevent  the  empirical  standard  errors  of 
Section  7  from  agreeing  with  the  theoretical  standard  errors  of  Sect  ion 
4 .  Since  our  main  purpose  is  to  show  that  the  method  of  Sc,  r ion  •  an 
give  useful  results,  the  empirie.aL  and  theoretical  standard  errors 
reported  here  are  all  estimated  or  calculated  under  the  condition  that 
the  true  values  of  c.  are  known  i  or  i  -  1,2.  1,4,3,  ! 2 .  . t . -s  .  ■ 

l 

are  easv  items,  item  12  was  included  because  of  its  low  a.  .  !7r 

i 

empirical  work,  the  true  c  values  were  supplied  to  I.OuISl,  which  he  la 
them  fixed  while  estimating  all  other  parameters.  For  theoret i on  I  work. 


the  rows  and  columns  of  (7)  corresponding  to 


i 


and  Cp  were  simply  deleted  iron  the  ini ormut  ion  matrix  (7)  hot  •.•re 
i  nvers ion . 

Table  2  compares  the  empiric. ii  standard  errors  oj  Section  for 

B  with  the  theoretical  standard  errors  of  Station  .  The  last  three 

columns  show  the  squared  ratios  for  the  three  replications  of  each 

item;  each  of  these  ratios  will  have  an  K  distribution  with  5  and 

x  degrees  of  freedom  provided  i)  B  has  a  normal  sampling  distribution, 

ii)  B  is  unbiased,  and  iii)  the  theoretical  from  Section  4  is 

B 

correct.  An  F  above  2.21  or  below  .229  is  significant  at  the  (two- 
tailed)  10  percent  level.  Eleven  of  the  ratios  are  significant.  The 
rnmber  of  ratios  less  than  1  is  approximately  the  same  as  the  number 
of  ratios  greater  than  1. 

In  the  past,  the  only  available  standard  errors  for  item  param¬ 
eters  assumed  that  the  were  known.  Such  standard  errors  for  B  , 
for  known  ,  are  given  in  the  second  column  of  the  table.  A  com¬ 
parison  of  second  and  third  columns  shows  very  close  agreement  except 
for  the  three  easiest  Items  (1,2,3).  For  these  throe  items,  our  new 
thcoret ioal  vain,  is  larger  and  agrees  bettor  with  the  empirical 
value.  Ibis  gives  support  to  the  new'  theoretical  values.  The  fact 
•.hat  tile  empirical  values  (from  Sect  ion  7)  tend  to  be  larger  than 
the  theoretical  (from  Section  4)  could  be  due  to  n  and  \!  not 
being  large  enough  for  asvmptot fo  results.  A  second  likely  explana¬ 
tion  is  that  1,0(1 1  ST  was  not  really  run  to  complete  convergence. 

Table  3  makes  comparisons  for  A  .  Again  the  standard  errors 
of  A  with  unknown  agree  closely  with  the  results  when  is 

known.  The  empirical  standard  errors,  although  correlating  well  vita 


the  theoretical,  seem  to  be  larger.  Eleven  of  the  F  ratios  are 
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Table  2 


Theoretical  and  Empirical  Standard  Errors  for  B 


V •  "b  SB 


Item 

No. 

(  known) 

(Sect.  4) 

(Sect.  7) 

2,2 

SB  '  H 

1* 

.110 

.156 

.183 

.23 

.  56 

2* 

.186 

.201 

.237 

1.76 

1.49 

3* 

.045 

.071 

.063 

1.38 

.59 

*4 

.060 

.068 

.066 

.90 

.  76 

5* 

.100 

.099 

.103 

.37 

.40 

6 

.125 

.121 

.131 

.28 

.63 

7 

.113 

.110 

.100 

1.24 

.65 

L 

.084 

.083 

.088 

2.31* 

.97 

9 

.055 

.055 

.067 

.  ?  7 

2 . 63* 

10 

.069 

.069 

.106 

3.19* 

3.62* 

11 

.100 

.097 

.122 

1.45 

2 . 5  5 1 

i  2* 

.094 

.091 

.087 

.85 

1.27 

13 

.086 

.083 

.094 

1.01 

1 .20 

14 

.077 

.076 

.111 

1.19 

1.49 

15 

.072 

.075 

.093 

.40 

2.62-1' 

J  .  JH 

.93 

.41 

1.17 

2.48 

2.63 
.  r'8 
.16 

1.47 
.  3  3 
.70 
.  66 
1.57 
3.75 

1.63 


Significant  at  10  percent  level. 

*The  6  parameter  for  these  items  is  treated  as  known. 


Table  3 


Theoret ical 

and 

Emp ir ical 

Standard 

Errors  for 

A 

Item 

"  A  6 

s 

sV2 

No. 

TA 

A 

A  A _ 

1* 

.088 

105 

.141 

.95 

.91 

3.60- 

2* 

.  044 

046 

.039 

.88 

.51 

.74 

3* 

.097 

117 

.094 

1.39 

.32 

.  22- 

4* 

.060 

065 

.080 

.89 

2 . 7 1~ 

.86 

5* 

.045 

04  7 

.054 

.63 

2.44;- 

.93 

6 

.103 

102 

.123 

1.54 

.30 

2 . 51- 

7 

.105 

105 

.14  7 

1.30 

2.25"' 

2 . 3  5  -*• 

8 

.113 

115 

.159 

1.29 

3 . 20x 

1.29 

9 

.123 

128 

.182 

1.8? 

3 . 39  '' 

.80 

10 

.184 

193 

.160 

.71 

.55 

.79 

11 

.115 

120 

.  132 

1.42 

L.85 

.34 

12* 

.060 

060 

.076 

.95 

2 . 94  '" 

.94 

13 

.151 

157 

.187 

2 . 4  O'" 

1.08 

.79 

14 

.209 

218 

.240 

!  .32 

.91 

1.43 

15 

.222 

233 

.182 

.25 

.65 

.93 

"Signif icant  at  10  percent  level. 

*The  C  parameter  tor  these  items  is  treated  as  known. 


significant.  Similar  statements  apply  to  Table  4,  which  shows  the 
comparisons  for  C  . 

Table  5  compares  standard  errors  for  .  Let  us  leave  column  3 
for  later  discussion.  Columns  4  and  5  show  standard  errors  of  cor¬ 

responding  to  the  •*  value  in  the  first  column;  column  6,  however, 
is  computed  from  (2)  for  the  group  of  people  falling  in  the  class 

interval  with  midpoint  '  .  There  is  good  agreement  between  empirical 
and  theoretical  standard  errors  except  for  4  <  -1.5  .  For  low  , 
asymptotic  results  do  not  appear  with  the  usual  n  and  N  . 

Table  5  shows  close  agreement  of  our  standard  error  from  Sections 
2-4  with  the  standard  error  of  1  when  the  item  parameters  are  known.  The 
agreement  shown  here  and  in  previous  tables  suggests  that  (1)  is  a  good 
approximation  to  the  diagonal  of  (2)  and  similarly  for  item  parameters, 
that  (2)  agrees  well  with  the  empirical  standard  errors. 

A  comparison  of  the  third  and  fifth  columns  in  Table  5  shows  what 
happens  to  when  all  must  be  estimated  from  the  data:  For 

;>  *•  -1  ,  r  is  sharply  affected;  for  0  <  -  '  2.5  ,  there  is  verv 

little  effect. 

Table  6  contains  the  squared  ratios  of  the  empirical  standard  errors 
to  the  theoretical  standard  errors  for  the  five  ■'  closest  to  the  midpoint 
of  the  intervals,  and  within  at  least  .1  of  the  midpoint.  Two  of  the 
groups  had  only  two  abilities  within  this  restriction.  If  similar  caveats 
apply  as  for  the  item  parameters  these  ratios  will  have  an  F  distribution 


with  five  and  ■■  degrees  of  freedom.  Only  eight  of  the  ratios  are 
significant  at  the  two-tailed  10X  level,  and  only  16  are  greater  than  1. 
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Table  4 


Theoretical  and 

Empirical 

Standard 

Errors  in 

C. 

Item 

No.* 

°_C 

fc 

.2,2 
fac/  C 

6 

.056 

058 

.063 

.39 

.44 

2.79+ 

7 

.049 

050 

.038 

.40 

.35 

.95 

8 

.037 

037 

.04  5 

3 . 08+ 

.76 

.  4  3 

9 

.024 

025 

.039 

.80 

4 . 71 ; 

1.83 

10 

.025 

026 

.034 

2 . 24+ 

IO 

0° 

.27 

11 

.036 

037 

.04  3 

.98 

2.67- 

.41 

13 

.026 

02  7 

.037 

.89 

1  .88 

2.90' 

14 

.019 

020 

.028 

2.98+ 

2 . 5  5 : 

.4  3 

15 

.015 

015 

.016 

.64 

1.23 

1.71 

uSignif icant  at  10  percent  level. 

*C^,...,C^  ,  and  C^0  are  treated  as  known. 


1 


Table  5 


Theoretical 

and  Empirical 

Standard 

Errors  for  n 

All  C. 

l 

unknown 

C1 

to  C,.  and 

as  known 

treated 

N 

£r 

iA.B.C 

-2.75 

10 

2.090 

.951 

.966 

A 

-2.25 

35 

1.296 

.686 

.699 

1.134 

-1.75 

93 

.861 

.516 

.525 

.797 

-1.25 

219 

.607 

.400 

.404 

.427 

-.75 

332 

.456 

.341 

.342 

.332 

-.25 

326 

.349 

.295 

.295 

.279 

.25 

227 

.278 

.262 

.263 

.2  74 

.  75 

136 

.261 

.260 

.261 

.286 

1.25 

77 

.303 

.289 

.290 

.349 

1.75 

25 

.422 

.384 

.387 

.412 

2.25 

3 

.628 

.575 

.580 

* 

2.75 

0 

.93  1 

.874 

.878 

* 

*Not  computed  because  ot'  small  N 

8 


[ 

1 

il 
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Table  6 

F  Ratios  for  6 


-2.75 

3.  73+ 

4  .41- 

-2.25 

.85 

.78 

.43 

11.34+ 

1.16 

-1.75 

.57 

1.90 

1.62 

.32 

18.95' 

-1.25 

.98 

.63 

.96 

.95 

.  77 

-.75 

.26 

.94 

.63 

.81 

.63 

-.25 

.71 

1.81 

.73 

.  04  - 

.48 

.25 

.18t 

.98 

.74 

.80 

.77 

.75 

.61 

.35 

1.41 

1.21 

.64 

1.25 

2 . 76+ 

1.82 

.98 

1.08 

1.84 

1.75 

.67 

.41 

1.08 

1.45 

1.  78 

2.25 

.11+ 

.36 

2.75* 

tSignificant  at  10  percent  level. 


*There  were  no  0  between  2.65  and  2.85. 
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Table  7  presents  the  theoretical  standard  errors  of  A  ,  B  ,  and 

C  ,  obtained  bv  the  method  of  Sections  2-4,  when  all  C.  must  be  esti- 

1 

mated  from  the  data.  It  is  interesting  to  compare  these  values  with 
those  in  Tables  2-4  where  ,  and  were  treated  as  known. 

We  find  that  the  standard  errors  of  to  are  increased 

drastically  by  ignorance  of  to  ;  all  other  n(B^)  are  much 

increased,  except  for  i  =  11,  13,  and  14.  All  A^  show  sharply 
increased  standard  errors.  For  items  for  which  must  be  estimated, 

on  the  otner  hand,  the  standard  errors  of  C.  are  little  affected  bv 

l 

knowledge  or  ignorance  of  , . . . ,C^,C^ .  A  likely  explanation  for  this 

is  that  errors  in  estimating  the  scale  unit  B^  affect  the  standard 

errors  of  the  A.  and  the  B.  ,  hut  not  of  the  C.  . 

1  1  x 

We  have  found  in  Tables  2-7  some  illustrative  answers  to  the 
question  How  do  estimation  errors  on  one  set  of  items  affect  the 
accuracy  of  estimated  parameters  for  a  different  set  of  items?  Such 
effects  could  not  be  quantified  until  now  since  the  standard  error  of 
an  item  parameter  estimate  was  previously  known  only  for  fixed  v  . 

It  is  only  through  the  sampling  fluctuations  of  u  that  estimation 
errors  for  one  item  can  affect  parameter  estimates  for  another  item. 

With  18  Cy  treated  as  known,  the  Fisher  information  matrix  inverted 
for  this  study  has  3  x  45  -  18  +  1498  =  1615  rows  uni  columns.  The 
matrix  inversion  by  the  method  of  Section  4  used  1232K  bytes  of  memory  on 
an  IBM  3031  and  took  32  seconds.  The  computer  program  dealt  with  a  45- 
item  test;  it  did  not  take  advantage  of  the  fact  that  the  45  items 
consisted  of  3  replicate  sets  of  15  items  each. 
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Table  7 

Standard  Errors  (2)  of  Item  Parameters  when 

All  C.  Must  be  Estimated 
1 
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In  order  to  verity  the  numerical  accuracy  of  the  inversion,  the 
information  matrix  and  the  variance-covariance  matrix  were  multiplied. 

The  result  was  an  identity  matrix  accurate  to  10  decimal  places.  The 
variance-covariance  matrix  obtained  in  double  precision  agreed  with  the 
matrix  obtained  in  quadruple  precision  to  all  six  decimal  places  printed. 

9.  Sampling  Covariances  and  Correlat ions 

When  item  parameters  are  known,  ■  and  (  a  #  b  )  are 

unccrrelat ed .  When  ability  parameters  are  known,  estimated  item  param¬ 
eters  for  different  items  are  uncorrelated.  When  both  item  and  ability 
parameters  are  estimated,  in  general  all  estimates  are  correlated 
The  computer  printout  of  the  sampling  correlations  for  the  present 
study  consists  of  10  correlation  matrices.  These  need  only  be  sum¬ 
marized  here. 

Table  8  shows  the  theoretical  (  T  )  and  empirical  (  E  )  cor¬ 
relations  between  estimates  of  two  different  parameters  for  the  same 
item.  The  correlations  are  generally  substantial.  for  comparison, 
the  theoretical  correlations  when  the  abilities  are  known  are  included. 

The  empirical  correlations  are  obtained  by  dividing  the  estimated  sampling 
covariance  by  the  square  roots  of  the  estimated  sampling  variances.  If 
the  empirical  correlations  here  have  roughly  15  degrees  of  freedom,  their 

2  i —  2 

standard  error  is  roughly  (1  -  .  =  .26(1  -  .  )  .  In  view  ot 

their  standard  errors,  there  is  very  satisfactory  agreement  of 
empirical  with  theoretical  correlations. 

Table  9  shows  both  theoretical  and  empirical  correlations  for 
the  B ^  (  i  =  1,2,..., 15  ).  The  corresponding  standard  errors  are 


TABLE  9 

EXPERIMENTAL  (E)  AND  THEORETICAL  (T)  STANDARD  ERRORS  (DIAGONALS)  AND 
CORRELATIONS  FOR  TRANSFORMED  B  (DECIMAL  POINTS  OMITTED) 
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given  in  parentheses  in  the  diagonal.  The  only  theoretical  correl¬ 
ations  above  .20  are  among  B  ,  B9  ,  ,  and  .  These  are  the 

four  easiest  items.  Any  error  in  estimating  the  scale  unit  B^  - 
would  seriously  affect  all  these  items  in  the  same  way.  It  is  hard 
to  draw  other  useful  generalizations  from  this  table. 

The  corresponding  table  for  the  (  i  =  1,2,..., 15  )  shows 

only  3  theoretical  correlations  above  .20:  =  .27  ,  r,  ,  =  .20  , 

13  Is 

=  .23  .  With  two  exceptions  (  =  -.013  ,  r,  ,  0  =  -.002  ), 

js  b  /  b , 1/ 

all  theoretical  correlations  are  positive. 

The  highest  theoretical  correlation  among  the  (  i  =  6,7,..., 

11  and  13,  14,  15  )  is  p,_  =  .04  .  All  correlations  are  positive. 

6  / 

The  theoretical  correlations  between  A.  and  E.  (  i  /  i  )  are 

i  J 

all  below  .20  in  absolute  value,  except  for  items  1-4,  which  vary  from 

.14  to  .38.  For  and  C\  (  i  ^  j  ;  j  #  1,2, ...,5,22  )  there  are 

no  correlations  above  .25  in  absolute  value.  For  A.  and  C.  ,  there 

i  .1 

are  no  correlations  above  .20  in  absolute  value. 

The  theoretical  correlations  between  and  ,  (  a  ^  b  ) 

a  b 

are  all  less  than  .04  in  absolute  value.  Between  and  B.  ,  the 

a  l 

largest  correlation  in  absolute  value  is  .15  (when  i  =  1  and 

=  -2.25  ).  Between  and  A.  ,  the  largest  is  .12  (when  i  =  1 

3  1 

and  =  -2.25  ).  Between  and  C.  ,  the  largest  is  .06. 

a  1 
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Summarv 

When  both  abilities  and  item  parameters  are  unknown,  the  asymptotic 
sampling  variance-covariance  matrix  developed  in  this  paper  appears  to 
provide  useful  values  for  the  standard  errors  needed  for  further 
research  in  item  response  theory.  The  magnitude  of  the  numerical 
values  in  the  matrix  were  very  much  affected  by  the  method  used  to 
define  the  scale.  For  a  set  of  artificial  data,  this  variance- 
covariance  matrix  compared  sat isfactorially  with  empirical  results; 
also  with  the  variance-covariance  matrices  found  by  the  usual  formulas  for 
the  case  where  the  abilities  are  known  or  where  the  item  parameters  are 
known. 

With  this  matrix,  the  otlect  on  other  items  of  including  items 
with  poorly  determined  parameters  can  be  studied.  Including  items  with 
poorlv  determined  c  ' s  increases  the  standard  errors  of  all  of  the  a  's 
and  b  's  but  not  of  the  other  c  's.  The  effect  of  different  distribu¬ 
tions  ot  abilities  on  the  accuracy  of  item  parameters  can  also  be  studied. 
Hopefully  a  goodness-of-f it  test  can  now  be  developed  for  the  three- 
parameter  model. 

The  standard  errors  of  item  parameters  can  now  be  studied  for  a 
situation  of  common  occurrence  in  equating  and  item  banking:  I’nch  of 
two  tests  containing  common  items  is  administered  to  a  different  group 
of  examinees;  all  parameters  are  estimated  in  the  same  I.OCIST  run. 

It  is  of  particular  interest  to  determine  how  the  number  of  common  items 
affects  the  standard  error  of  the  parameter  estimates. 
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